Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy#12
Merged
yeahdongcn merged 1 commit intomainfrom Jul 7, 2025
Merged
Use mudnn::Unary::IDENTITY op to accelerate D2D memory copy#12yeahdongcn merged 1 commit intomainfrom
yeahdongcn merged 1 commit intomainfrom
Conversation
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
There was a problem hiding this comment.
Pull Request Overview
This PR integrates MUSA’s mudnn::Unary IDENTITY operation to accelerate device-to-device memory copies for FP16/FP32 tensors, offering ~40% performance improvements.
- Adds
mudnnMemcpyAsyncAPI declaration and implementation using mudnn::Unary::IDENTITY - Updates
cpy.cuto route contiguous F32/F16 copies through MUSA when enabled - Extends CMake configurations to include new sources and link against the mudnn library
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| ml/backend/ggml/ggml/src/ggml-musa/mudnn.cuh | Declares mudnnMemcpyAsync |
| ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu | Implements mudnnMemcpyAsync with MUSA DNN identity op |
| ml/backend/ggml/ggml/src/ggml-musa/CMakeLists.txt | Adds mudnn headers/sources and links mudnn library |
| ml/backend/ggml/ggml/src/ggml-cuda/cpy.cu | Routes contiguous FP16/FP32 copies through mudnnMemcpyAsync |
| CMakeLists.txt | Includes mudnn in runtime dependency regex and link steps |
Comments suppressed due to low confidence (3)
ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu:88
- There are no unit tests covering mudnnMemcpyAsync; adding tests for both FLOAT and HALF copy paths would help ensure correctness and prevent regressions.
musaError_t mudnnMemcpyAsync(ggml_backend_cuda_context& ctx, const ggml_tensor* dst, const ggml_tensor* src) {
ml/backend/ggml/ggml/src/ggml-musa/mudnn.cu:1
- The file uses std::vector and std::unordered_map but does not include or <unordered_map>, leading to compilation failures.
#include <mutex>
ml/backend/ggml/ggml/src/ggml-musa/CMakeLists.txt:101
- Static builds are not linking against the mudnn library, which will cause unresolved symbol errors when mudnn code is compiled; consider linking mudnn for static configuration or disabling mudnn support in static mode.
target_link_libraries(ggml-musa PRIVATE MUSA::musart_static MUSA::mublas_static)
fishingfly
approved these changes
Jul 7, 2025
yeahdongcn
added a commit
that referenced
this pull request
Jul 10, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Jul 31, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 6, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 7, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 11, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 18, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 19, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 20, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 21, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Aug 26, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Sep 1, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Oct 2, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
yeahdongcn
added a commit
that referenced
this pull request
Oct 13, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a manual merge of ggml-org/llama.cpp#13647.
Testing Done
The previous eval rate was around 7 tokens/s — this update improves it by approximately 40%.